MoJo: A Distance Metric for Software Clusterings

نویسندگان

  • Vassilios Tzerpos
  • Richard C. Holt
چکیده

The software clustering problem has attracted much attention recently, since it is an integral part of the process of reverse engineering large software systems. A key problem in this research is the difficulty in comparing different approaches in an objective fashion. In this paper, we present a metric that can be used in evaluating the similarity of two different decompositions of a software system. Our metric calculates a distance between two partitions of the same set of software resources. We begin by introducing the model we use. Consequently, we present a heuristic algorithmthat calculates the distance in an efficient fashion. Finally, we discuss some experiments that showcase the performance of the algorithm and the effectiveness of the metric.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An optimal algorithm and extensions for the MoJo distance measure

A problem that the software industry frequently faces is the maintenance and improvement of legacy software systems. Though most legacy software systems are still working well, their structure is often no longer understood. When one wants to migrate a legacy software system to a new operating system or different programming language, or to add to its functionality, it is essential to recover th...

متن کامل

Annotation-based Distance Measures for Patient Subgroup Discovery in Clinical Microarray Studies

MOTIVATION Clustering algorithms are widely used in the analysis of microarray data. In clinical studies, they are often applied to find groups of co-regulated genes. Clustering, however, can also stratify patients by similarity of their gene expression profiles, thereby defining novel disease entities based on molecular characteristics. Several distance-based cluster algorithms have been sugge...

متن کامل

Spatially-Aware Comparison and Consensus for Clusterings

This paper proposes a new distance metric between clusterings that incorporates information about the spatial distribution of points and clusters. Our approach builds on the idea of a Hilbert space-based representation of clusters as a combination of the representations of their constituent points. We use this representation and the underlying metric to design a spatially-aware consensus cluste...

متن کامل

Software component capture using graph clustering

We describe a simple, fast computing and easy to implement method for finding relatively good clusterings of software systems. Our method relies on the ability to compute the strength of an edge in a graph by applying a straightforward metric defined in terms of the neighborhoods of its end vertices. The metric is used to identify the weak edges of the graph, which are momentarily deleted to br...

متن کامل

Weighted Ensemble Clustering for Increasing the Accuracy of the Final Clustering

Clustering algorithms are highly dependent on different factors such as the number of clusters, the specific clustering algorithm, and the used distance measure. Inspired from ensemble classification, one approach to reduce the effect of these factors on the final clustering is ensemble clustering. Since weighting the base classifiers has been a successful idea in ensemble classification, in th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999